ProvGen: Generating Synthetic PROV Graphs with Predictable Structure

نویسندگان

  • Hugo Firth
  • Paolo Missier
چکیده

This paper introduces provGen, a generator aimed at producing large synthetic provenance graphs with predictable properties and of arbitrary size. Synthetic provenance graphs serve two main purposes. Firstly, they provide a variety of controlled workloads that can be used to test storage and query capabilities of provenance management systems at scale. Secondly, they provide challenging testbeds for experimenting with graph algorithms for provenance analytics, an area of increasing research interest. provGen produces PROV graphs and stores them in a graph DBMS (Neo4J). A key feature is to let users control the relationship makeup and topological features of the graph, by providing a seed provenance pattern along with a set of constraints, expressed using a custom Domain Specific Language. We also propose a simple method for evaluating the quality of the generated graphs, by measuring how realistically they simulate the structure of real-world patterns.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Modeling Physical Variability for Synthetic MOUT Agents

Generating behavioral variability is an important prerequisite in the development of synthetic MOUT (Military Operations in Urban Terrain) agents for military simulations. Agents that lack variability are predictable and ineffective as opponents and teammates for human trainees. Along with cognitive differences, physical differences contribute towards behavioral variability. In this paper, we d...

متن کامل

Crowdsourcing data citation graphs using provenance

In this paper we describe a tool designed to support crowdsourcing a-posteori provenance information about the datasets used in research publications. It generates PROV data both to capture the data citation graphs—via an extension to the PROV Data Model, and the crowdsourcing process—via prov:bundles.

متن کامل

Towards the Domain Agnostic Generation of Natural Language Explanations from Provenance Graphs for Casual Users

As more systems become PROV-enabled, there will be a corresponding increase in the need to communicate provenance data directly to users. Whilst there are a number of existing methods for doing this — formally, diagrammatically, and textually — there are currently no application-generic techniques for generating linguistic explanations of provenance. The principal reason for this is that a cert...

متن کامل

Graph Hybrid Summarization

One solution to process and analysis of massive graphs is summarization. Generating a high quality summary is the main challenge of graph summarization. In the aims of generating a summary with a better quality for a given attributed graph, both structural and attribute similarities must be considered. There are two measures named density and entropy to evaluate the quality of structural and at...

متن کامل

Synthetic Graph Generation from Finely-Tuned Temporal Constraints

Large-scale graphs are at the core of a plethora of modern applications such as social networks, transportation networks, or the Semantic Web. Such graphs are naturally evolving over time, which makes particularly challenging graph processing tasks e.g., graph mining. To be able to realize rigorous empirical evaluations of research ideas, the graph processing community needs finely-tuned genera...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014